FROM THE LITTLEWOOD-OFFORD PROBLEM TO 
THE CIRCULAR LAW: UNIVERSALITY OF THE 
SPECTRAL DISTRIBUTION OF RANDOM MATRICES 
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Abstract. The famous circular law asserts that if M„ is an n x n 
matrix with iid complex entries of mean zero and unit variance, 
then the empirical spectral distribution (ESD) of the normalized 
matrix -^M„ converges both in probability and almost surely to 
the uniform distribution on the unit disk {z £ C : \z\ < 1}. After 
a long sequence of partial results that verified this law under addi- 
tional assumptions on the distribution of the entries, the circular 
law is now known to be true for arbitrary distributions with mean 
zero and unit variance. In this survey we describe some of the key 
ingredients used in the establishment of the circular law at this 
level of generality, in particular recent advances in understanding 
the Littlewood-Offord problem and its inverse. 



1. ESD OF RANDOM MATRICES 



For an n X n matrix An with complex entries, let 

1 " 

i=l 

be the empirical spectral distribution (ESD) of its eigenvalues Aj £ 
C, i = 1, . . . n (counting multiplicity), thus for instance 

IJ-Aniiz e ClRez < s;lmz < t}) — — 111 < i < n : ReAj < s;ImAi < t}\ 

n 

for any s, t G R (we use |A| to denote the cardinality of a finite set A), 
and 

" i=l 

for any continuous compactly supported /. Clearly, is a discrete 
probability measure on C. 
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A fundamental problem in the theory of random matrices is to com- 
pute the limiting distribution of the ESD fiA„ of a sequence of random 
matrices An with sizes tending to infinity [311 HJ- what follows, we 
consider normalized random matrices of the form An = -^Mn, where 

Mn = i^ij)i<i,j<n has cutries that are iid random variables 

Such matrices have been studied at least as far back as Wishart |58] 

(see [Ml 15 fci^ more discussion) . 

One of the first limiting distribution results is the famous semi-circle 
law of Wigner |5T|. Motivated by research in nuclear physics, Wigner 
studied Hermitian random matrices with (upper triangular) entries be- 
ing iid random variables with mean zero and variance one. In the 
Hermitian case, of course, the ESD is supported on the real line R. He 
proved that the expected ESD of a normalized nxn Hermitian matrix 
-^Mn, where M„ = i^ij)i<i,j<n has iid gaussian entries Xjj = A^(0, 1), 

converges in the sense of probability measure ^ to the semi-circle dis- 
tribution 

^ 1 [-2,2] (x) V4 - x2 dx (1) 
on the real line, where 1e denotes the indicator function of a set E. 

Theorem 1.1 (Semi-circular law for the Gaussian ensemble). ^7] Let 

Mn be an nxn random Hermitian matrix whose entries are iid gaussian 
variables with mean and variance 1. Then, with probability one, the 
ESD of -^Mn converges in the sense of probability measures to the 
semi-circle law (fTl). 



Henceforth we shall say that a sequence /z„ of random probability mea- 
sures converges strongly to a deterministic probability measure if, 
with probability one, Hn converges in the sense of probability measures 
to /i. We also say that /i„ converges weakly to /i if for every continuous 
compactly supported f,ff dfin converges in probability to f f dfi, 
thus P(| J f d/in ~ J f dii\ > e) ^ as n — > cxD for each e > 0. 
Of course, strong convergence implies weak convergence; thus for in- 
stance in Theorem 1.1, ^ij_m converges both weakly and strongly to 

the semicircle law. 



Wigner also proved similar results for various other distributions, such 
as the Bernoulli distribution (in which each Xj^ equals +1 with proba- 
bility 1/2 and —1 with probability 1/2). His work has been extended 



We say that a collection jin of probability measures converges to a limit /i if 
one has / / (i/i„ ^ J f for every continuous compactly supported function /, or 
equivalently if ^{{z e CjRez < s; Imz < t}) converges to ii{{z G C|Rez < s; Imz < 
t}) for all s,t. 
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and strengthened in several aspects [H [21 136]. The most general form 
was proved by Pastur [36] : 

Theorem 1.2 (Semi-circular law). [36j Let Mn be an n x n random 
Hermitian matrix whose entries are iid complex random variables with 
mean and variance 1. Then ESD of -^Mn converges (in both the 
strong and weak senses) to the semi-circle law. 

The situation with non-Hermitian matrices is much more complicated, 
due to the presence of pseudospectrurr^that can potentially make the 
ESD quite unstable with respect to perturbations. The non-Hermitian 
variant of this theorem, the Circular Law Conjecture, has been raised 
since the 1950 's (see Chapter 10 of or the introduction of [3j) 

Conjecture 1.3 (Circular law). Let Mn be the n x n random ma- 
trix whose entries are iid complex random variables with mean and 
variance 1. Then the ESD of -^Mn converges (in both the strong and 

weak senses) to the uniform distribution /i := ^l\z\<idz on the unit disk 
{zeC:\z\<l}. 

The numerical evidence for this conjecture is extremely strong (see e.g. 
Figure [T|. However, there are significant difficulties in establishing this 
conjecture rigorously, not least of which is the fact that the main tech- 
niques used to handle Hermitian matrices (such as moment methods 
and truncation) can not be applied to the non-Hermitian model (see 
[U Chapter 10] for a detailed discussion). Nevertheless, the conjecture 
has been intensively worked on for many decades. The circular law 
was verified for the complex gaussian distribution in [31] and the real 
gaussian distribution in [12]. An approach to attack the general case 
was introduced in [T^ , leading to a resolution of the strong circular law 
for continuous distributions with bounded sixth moment in [3j. The 
sixth moment hypothesis in ^ was lowered to (2 + ?])*^ moment for any 
77 > in [1]. The removal of the hypothesis of continuous distribution 
required some new ideas. In [21] the weak circular law for (possibly 
discrete) distributions with subgaussian moment was established, with 
the subgaussian condition relaxed to a fourth moment condition in [35] 

^Informally, we say that a complex number z lies in the pseudospectrum of a 
square matrix A if {A — zl)~^ is large (or undefined). If z lies in the pseudospec- 
trum, then small perturbations of A can potentially cause z to fall into the spectrum 
of A, even if it is initially far away from this spectrum. Thus, whenever one has 
pseudospectrum far away from the actual spectrum, the actual distribution of eigen- 
values can depend very sensitively (in the worst case) on the coefficients of A. Of 
course, our matrices are random rather than worst-case, and so we expect the most 
dangerous effects of pseudospectrum to be avoided; but this of course requires some 
analytical effort to establish, and deterministic techniques (e.g. truncation) should 
be used with extreme caution, since they are likely to break down in the worst case. 
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(see also [19] for an earlier result of similar nature), and then to (2+77)*'^ 
moment in [22]. Shortly before this last result, the strong circular law 
assuming (2 + t])^^ moment was established in [5l]. Finally, in a re- 
cent paper [^3], the authors proved this conjecture (in both strong and 
weak forms) in full generality. In fact, we obtained this result as a 
consequence of a more general theorem, presented in the next section. 



2. Universality 



An easy case of Conjecture 1.3 is when the entries Xij of M„ are iid 
complex gaussian. In this case there is the following precise formula for 
the joint density function of the eigenvalues, due to Ginibre [IT] (see 
also [M], [25] for more discussion of this formula): 



pix,, . . . , A„) = c„ n - ^.f n (2) 

[i<j i=l 

From here one can verify the conjecture in this case by a direct calcu- 
lation. This was first done by Mehta and also Silverstein in the 1960s: 

Theorem 2.1 (Circular law for Gaussian matrices). [34j Let Mn he an 
n X n random matrix whose entries are iid complex gaussian variables 
with mean and variance 1. Then, with probability one, the ESD of 
-i=M„ tends to the circular law. 

A similar result for the real gaussian ensemble was established in |12j . 
These methods rely heavily on the strong symmetry properties of such 
ensembles (in particular, the invariance of such ensembles with respect 
to large matrix groups such as 0{n) or U{n)) in order to perform 
explicit algebraic computations, and do not extend directly to more 
combinatorial ensembles, such as the Bernoulli ensemble. 

The above mentioned results and conjectures can be viewed as exam- 
ples of a general phenomenon in probablity and mathematical physics, 
namely, that global information about a large random system (such as 
limiting distributions) does not depend on the particular distribution 
of the particles. This is often referred to as the universality phenom- 
enon (see e.g. [9]). The most famous example of this phenomenon is 
perhaps the central limit theorem. 



In view of the universality phenomenon, one can see that Conjecture L3 
generalizes Theorem 2A in the same way that Theorem L2 generalizes 
Theorem II. 1[ 
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Figure 1. Eigenvalue plots of two randomly generated 
5000 by 5000 matrices. On the left, each entry was an 
iid Bernoulli random variable, taking the values +1 and 
— 1 each with probability 1/2. On the right, each entry 
was an iid Gaussian normal random variable, with prob- 
ability density function is exp(— (These two 
distributions were shifted by adding the identity matrix, 
thus the circles are centered at (1,0) rather than at the 
origin.) 

A demonstration of the circular law for the Bernoulli and the Gaussian 
case appear^ in the Figure [l| 

The universality phenomenon seems to hold even for more general mod- 
els of random matrices, as demonstrated by Figure |2] and Figure [3] 

This evidence suggests that the asymptotic shape of the ESD depends 
only on the mean and the variance of each entry in the matirx. As 
mentioend earlier, the main result of [55j (building on a large number 
of previous results) gives a rigorous proof of this phenomenon in full 
generality. 

For any matrix A, we define the Frobenius norm (or Hilbert- Schmidt 
norm) \\A\\f by the formula \\A\\f := trace(AA*)i/2 ^ trace(A*A)i/2_ 

Theorem 2.2 (Universality principle). Letx andj be complex random 
variables with zero mean and unit variance. Let X„ = {^ij)i<i,j<n o-nd 
Yn := (jij)i<ij<n be n X n random matrices whose entries Xij, j^j are 
iid copies ofx and j, respectively. For each n, let Mn be a deterministic 



'We thank Phillip Wood for creating the figures in this paper. 
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Figure 2. Eigenvalue plots of randomly generated n 
by n matrices of the form Dn + M„, where n = 
5000. In the left column, each entry of M„ was 
an iid Bernoulli random variable, taking the values 
+1 and —1 each with probability 1/2, and in the 
right column, each entry was an iid Gaussian nor- 
mal random variable, with probability density function 
is exp(— x^/2). In the first row, Dn is the de- 
terministic matrix diag(l, 1, . . . , 1, 2.5, 2.5, . . . , 2.5), and 
in the second row Dn is the deterministic matrix 
diag(l, 1, . . . , 1, 2.8, 2.8, . . . , 2.8) (in each case, the first 
n/2 diagonal entries are I's, and the remaining entries 
are 2.5 or 2.8 as specified). 



n X n matrix satisfying 



sup — ||M„||| < oo. (3) 



Let An := Mn+Xn and Bn := Mn + Yn- Then fij_A —fij_B converges 

s/rL " s/n " 

weakly to zero. If furthermore we make the additional hypothesis that 
the ESDs 

^(v^M„-zi){^Mr.-zir (4) 

converge in the sense of probability measures to a limit for almost every 
z, then — fJ'J^B converges strongly to zero. 



This theorem reduces the computing of the limiting distribution to the 
case where one can assum^ that the entries x have Gaussian (or any 

^Some related ideas also appear in [T^. In the context of the central limit 
theorem, the idea of replacing arbitrary iid ensembles by Gaussian ones goes back 
to Lindeberg |31j . and is sometimes known as the Lindeherg invariance principle; 
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Figure 3. Eigenvalue plots of two randomly generated 
5000 by 5000 matrices of the form A + BMnB, where A 
and B are diagonal matrices having n/2 entries with the 
value 1 followed by n/2 entries with the value 5 (for D) 
and the value 2 (for X). On the left, each entry of M„ 
was an iid Bernoulli random variable, taking the values 
+1 and —1 each with probability 1/2. On the right, each 
entry of M„ was an iid Gaussian normal random variable, 
with probability density function is exp(— x^/2). 



special) distribution. Combining this theorem (in the case M„ = 0) 



with Theorem 2.1, we conclude 



Corollary 2.3. The circular law (Conjecture 1.5) holds in both the 
weak and strong sense. 



It is useful to notice that Theorem 2.2 still holds even when the limiting 
distributions do not exist. 



The proof of Theorem 2.2 relies on several surprising connections be- 
tween seemingly remote areas of mathematics that have been discov- 
ered in the last few years. The goal of this article is to give the reader 
an overview of these connections and through them a sketch of the 



proof of Theorem 2.2 The first area we shall visit is combinatorics. 



3. Combinatorics 



As we shall discuss later, one of the primary difficulties in controlling 
the ESD of a non-Hermitian matrix An = -^Mn is the presence of 
pseudo spectrum - complex numbers z for which the resolvent {A^ — 
zl)^^ = {^Mn — zl)~^ exists but is extremely large. It is therefore 
of importance to obtain bounds on this resolvent, which leads one to 



see for further discussion, and a formulation of this principle for Hermitian 
random matrices. 
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understand for which vectors v G C" is {An — zl)v hkely to be small. 
Expanding out the vector (A„ — zl)v, one encounters expressions such 
as ^if 1 + . . . + ^„fn, where Wi, . . . , f„ G C are fixed and ^i, . . . , ^„ are 
iid random variables. The problem of understanding ths distribution 
of such random sums is known as the Littlewood-Offord problem, and 
we now pause to discuss this problem further. 

3.1. The Littlewood-Offord problem. Let v = {vi, . . . ,Vn} be a 

set of n integers and let ^i, . . . , ^„ be i.i.d random Bernoulli variables. 
Define S := Yl^=i ^i'^i Pv{(^) '■= P{S = a) and Pv '■= ^'^VaezPv{(^)- 

In their study of random polynomials, Littlewood and Offord |32] raised 
the question of bounding p^. They showed that if the Vi are non-zero, 
then Pv = OC^^). Very soon after, Erdos [13], using Sperner's lemma, 
gave a beautiful combinatorial proof for the following refinement. 

Theorem 3.2. Letvi, . . . ,Vn be non- zero numbers and be i.i.d Bernoulli 
random variables. Ther^ 



Notice that the bound is sharp, as can be seen from the example 
V := {1,...,1}, in which case S has a binomial distribution. Many 
mathematicians realized that while the classical bound in Theorem 13.21 
is sharp as stated, it can be improved significantly under additional 
assumptions on v. For instance, Erdos and Moser [14j showed that if 
the Vi are distinct, then 



They conjectured that the logarithmic term is not necessary and this 
was confirmed by Sarkozy and Szemeredi |12]. Again, the bound is 
sharp (up to a constant factor), as can be seen by taking vi, . . . ,Vn to 
be a proper arithmetic progression such as 1, . . . ,n. Stanley [H] gave 
a different proof that also classified the extremal cases. 

A general picture was given by Halasz, who showed, among other 
things, that if one forbids more and more additive structur^ in the 

^Wc use the usual asymptotic notation in this paper, thus X = 0{Y), Y = ^{X), 
X <^Y, or Y ^ X denotes an estimate of the form |X| < CY where C does not 
depend on n (but may depend on other parameters). We also let X = o{Y) denote 
the bound \X\ < c{n)Y, where c(n) — > as n ^ oo. 

^Intuitively, this is because the less additive structure one has in the Vi , the more 
likely the sums S are to be distinct from each other. In the most extreme case, if 




Pv = 0{n ^/^Inn). 
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Vi, then one gets better and better bounds on p^. One corollary of his 
results (see [21] or [IHl Chapter 9] is the following. 

Theorem 3.3. Consider v = {vi, . . . ,Vn}- Let Rk be the number of 
solutions to the equation 



£iVi^ H h e2kVi2k = 

where Si G { — 1, 1} and ii, . . . , i2k £ {1, 2, . . . , n}. Then 



Ok{n-"^-'/'Rk) 



Remark 3.4. Several variants of Theorem Ojcan be found in [2711301 flGt 
|28] and the references therein. The connection between the Littlewood- 
Offord problem and random matrices was first made in [26] , in connec- 
tion with the question of determining how likely a random Bernoulli 
matrix was to be singular. The paper [26] in fact inspired much of the 
work of the authors described in this survey. 



3.5. The inverse Littlewood-Offord problem. Motivated by in- 
verse theorems from additive combinatorics, in particular Freiman's 
theorem (see [12], [ISl Chapter 5]) and a variant for random sums in 
[531 Theorem 5.2] (inspired by earlier work in [26]), the authors [19] 
brought a different view to the problem. Instead of trying to improve 
the bound further by imposing new assumptions, we aim to provide 
the full picture by finding the underlying reason for the probability 
to be large (e.g. larger than for some fixed A). 

Notice that the (multi)-set v has 2"' subsums, and p^ > n~'-' mean 
that at least among these take the same value. This suggests 

that there should be very strong additive structure in the set. In order 
to determine this structure, one can study examples of v where p^ is 

large. For a set A, we denote by I A the set I A := {ai H G A}. 

A natural example is the following. 

Example 3.6. Let / = [— A^, A^] and vi, . . . ,Vn be elements of /. Since 
S e nl, by the pigeon hole principle, Pv > ^ = ^(^)- In fact, a short 
consideration yields a better bound. Notice that with probability at 
least .99, we have S G lOy/nl, thus again by the pigeonhole principle, 
we have p^ = If we set N = n'-^ for some constant C, then 

= ^(^)- (5) 

the Vi are linearly independent over the rationals Q, then the sums 2" sums S are 
all distinct, and so — 1/2" in this case. 
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The next, and more general, construction comes from additive combi- 
natorics. A very important concept in this area is that of a generalized 
arithmetic progression (GAP). A set Q of numbers is a GAP of rank d 
if it can be expressed as in the form 

Q = {o-o + Xiai + ■ ■ ■ + XdddlMi < Xi < Ml for all 1 < i < d} 

for some ao, . . . , a^. Mi, . . . , Ma, M[, . . . , M^. 

It is convenient to think of Q as the image of an integer box B : = 
{(xi, . . . , Xd) G Z'^lMj < a;j < M-} under the hnear map 

$ : (xi, . . . ,Xd) ^ ao + Xiai H h x^Od. 

The numbers the generators of P, and Vol((5) := \B\ is the 

volume of 5. We say that Q is proper if this map is one to one, or 
equivalently if \Q\ = Vol{Q). For non-proper GAPs, we of course have 
IQI <Vol(Q). 

Example 3.7. Let Q be a proper GAP of rank d and volume V. Let 
vi,...,Vn be (not necessarily distinct) elements of P. The random 
variable S = Y17=i^i'^i takes values in the GAP nP. Since \nP\ < 
Vol^nB) = n'^V, the pigeonhole principle implies that p^ > In 
fact, using the same idea as in the previous example, one can improve 
the bound to ^( ^d/2y )• If we set N = for some constant C, then 

= ^ij^)- (6) 



The above examples show that if the elements of v belong to a proper 
GAP with small rank and small cardinality then pv is large. A few years 
ago, the authors |49j showed that this is essentially the only reason: 

Theorem 3.8 (Weak inverse theorem). Let C, e > he arbitrary 
constants. There are constants d and C depending on C and e such 
that the following holds. Assume that v = {f i, . . . , f„} is a multiset of 
integers satisfying p^, > . Then there is a GAP Q of rank at most 
d and volume at most nP which contains all hut at most n^~'^ elements 
of \ (counting multiplicity). 

Remark 3.9. The presence of the small set of exceptional elements is 
not completely avoidable. For instance, one can add o(logn) completely 
arbitrary elements to v and only decrease pv by a factor of n~''^^'> at 
worst. Nonetheless we expect the number of such elements to be less 
than what is given by the results here. 



The reason we call Theorem 3.8 weak is the fact that the dependence 
between the parameters is not optimal and does not yet reflect the 
relations in ([s]) and ([6]). Recently, we were able to modify the approach 
to obtain an almost optimal result. 
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Theorem 3.10 (Strong inverse theorem). j56j Let C and 1 > e be 
positive constants. Assume that 

Pv > . 

Then there exists a GAP Q of rank d = Oc,e(l) which contains all but 
Od(n^~^) elements of \ (counting multiplicity), where 

|Q|=Oc,.(n^-^+^). 

The bound on |Q| matches (|6]), up to the e term. The proofs of Theorem 
3.8| and 3.10| use harmonic analysis, combined with resuhs from the 



theory of random walks and several facts about GAPs. Both theorems 
hold in a more general setting, where the elements of v are from a 
torsion-free group. The lower bound ra"*^ on p^ can also be relaxed, 
but the statement is more technical. 



As an application of Theorem 3.10 , one can deduce, in a straightforward 
manner, a slightly weaker version of the forward results mentioned 
above. For instance, let us show if the Vi are different, then p^ < 
^-3/2+5 g^j^y QQ^gtant ^ > o). Assumc otherwise and set e := S/2. 



Theorem 3.10 implies that most of v is contained in a GAP Q of rank 
d and cardinality at most 0(?2^/^"'^"'^/^+^) = 0{n^^^/'^) = o{n). But 
since v has (1 — o(l))n elements in Q, we obtain a contradiction. 



Next we consider another application of Theorem 3.10 which will be 
more important in later sections. This theorem enables us execute 
very precise counting arguments. Assume that we would like to count 
the number of (multi)-sets v of integers with max \vi\ < N such that 
P{v) > p := . 

Fix d > 1, fijfla GAP Q with rank d and volume V = n^''^/^. The 
dominating term will be the number of multi-subsets of size n of Q, 
which is 



For later purposes, we need a continuous version of this result. Let the 
Vi be complex numbers. Instead of p^, consider the maximum small 
ball probability 



max P(\S 



</?). 



more detailed version of Theorems |3 . 8l and |3. lOl tells us that there are not too 
many ways to choose the generators of Q. In particular, if = n'^^^\ the number 
of ways to fix these is negligible. 
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Given a small P > and p = n'^^^^\ the collection of v such that 
|f I = 1 and Pv(/5) > p is infinite, but we are able to show that it can 
be approximated by a small set. 

Theorem 3.11 (The /3-net Theorem). [3^ Suppose that p = n^'~'^^\ 
Then the set of unit vectors v = {vi, . . . , f„) such thatp^{f3) > p admits 
an j3-net (in the infinity norn^Q, of size at most 

\n\< p-"n-"/2+oW. (8) 

Remark 3.12. A related result (with different parameters) appears in 
|38j : in our notation, the probability p is allowed to be much smaller, 
but the net is coarser (essentially, a /3\/n-net rather than a /?-net). 
In terms of random matrices, the results in [38] are better suited to 
control the extreme tail of such quantities as the least singular value of 
An — zl, but require more boundedness conditions on the matrix An 
(and in particular, bounded operator norm) due to the coarser nature 
of the net. 

4. Computer Science 

Our next stop is computer science and numerical linear algebra, and in 
particular the problem of dealing with ill-conditioned matrices, which 
is closely related to the issue of pseudospectrum which is of central 
importance in the circular law problem. 

4.1. Theory vs Practice. Running times of algorithms are frequently 
estimated by worst-case analysis. But in practice, it has been ob- 
served that many algorithms, especially those involving a large matrix, 
perform significantly better than the worst-case scenario. The most 
famous example is perhaps the simplex algorithm in linear program- 
ming. Here, the basic problem (in its simplest form) is to optimize a 
goal function c ■ x, under the constraint Ax < b, where c, b are given 
vectors of length n and A is an n x n matrix. In the worst case scenario, 
this algorithm takes exponential time. But in practice, the algorithm 
runs extremally well. It is still very popular today, despite the fact that 
there are many other algorithms proven to have polynomial complexity. 

There have been various attempts to explain this phenomenon. In 
this section we will discuss an influential recent explanation given by 
Spielman and Teng H5] . 

^In other words, for any v with Pv{(3) > p, there exists v' e such that all 
coefficients of v — v' do not exceed /3 in magnitude. 
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4.2. The effect of noise. An important issue in the theory of com- 
puting is noise, as almost aU computational processes are effected by 
it. By the word noise, we would like to refer to all kinds of errors 
occurring in a process, due to both humans and machines, including 
errors in measuring, errors caused by truncations, errors committed in 
transmitting and inputting the data, etc. 

Spielman and Teng pij pointed out that when we are interested in a 
solving a certain system of equations, because of noise, our computer 
actually ends up solving a slightly perturbed version of the system. 
This is the core of their so-called smooth analysis that they used to 
explain the effectiveness of a specific algorithm (such as the simplex 
method). Interestingly, noise, usually a burden, plays a "positive" role 
here, as it smoothes the inputs randomly, and so prevents a very bad 
input from ever occurring. 

The puzzling question here is, of course: why is the perturbed input 
typically better than the original (worst-case) input ? 

In order to give a mathematical explanation, we need to introduce some 
notion. For an n x n matrix M, the condition number k,{M) is defined 
as 

k{M) := ||M||||M-^|| 

where {||| denotes the operator norm. (If M is not invertible, we set 
k{M) = oc.) 

The condition number plays a crucial role in numerical linear algebra; 
in particular, the condition number k{M) of a matrix M serves as a 
simplified proxy for the accuracy and stability of most algorithms used 
to solve the equation Mx = b (see [5l |23], for example). The exact 
solution X = M~^b, in theory, can be computed quickly (by Gaussian 
elimination, say). However, in practice computers can only represent 
a finite subset of real numbers and this leads to two difficulties: the 
represented numbers cannot be arbitrarily large or small, and there are 
gaps between two adjacent represented numbers. A quantity which is 
frequently used in numerical analysis is ^machine which is half of the 
distance from 1 to the nearest represented number. A fundamental 
result in numerical analysis asserts that if one denotes by x the 
result computed by computers, then the relative error ^rj^ satisfies 
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Following the literature, we call M well-conditioned if k(M) is small. 
For quantitative purposes, we say that an n by n matrix M is well- 
conditioned if its condition number is polynomially bounded in n (that 
is, k{M) < n'-^ for some constant C independent of n). 

4.3. Randomly perturbed matrices are well-conditioned. The 

analysis in [44J is guided by the following fundamental intuitioiil^ 

Conjecture 4.4. For every input instance, it is unlikely that a slight 
random perturbation of that instance has large condition number. 

More quantitatively. 

Conjecture 4.5. Let A be an arbitrary n by n matrix and let Mn be 
a random n by n matrix. Then with high probability A + M„ is well- 
conditioned. 



Notice that here one allows A to have a large condition number. 

Let us take a look at k{A + M„) = \\A + M„||||(A + M„)-^||. In order 
to have k{A + Mn) = nP'^^\ we want to upper-bound both \\A + M„|| 
and ||(A + M„)-i||. Bounding p + M„|| is easy, since by the triangle 
inequality 

P + M„|| < \\A\\ + ||M„||. 

In most models of random matrices, ||M„|| < rp'^^^ with very high 
probability, so it suffices to assume that ||y4|| < rP^^'>] thus we assume 
that the matrix A is of polynomial size compared to the noise level. 
This is a fairly reasonable assumption for high-dimensional matrices 
for which the effect of noise is non-negligibl^^ and we are going to 
assume it in the rest of this section. 

The remaining problem is to bound the norm of the inverse + 
M„)^^||. An important detail here is how to choose the random matrix 
Mn- In their works [HI HSl |13], Spielman and Teng (and coauthors) 
set Mn to have iid Gaussian entries (with variance 1) and obtained the 

^This conjecture, of course, does not fully explain the phenomenon of smoothed 
analysis, since it may be that a well-conditioned matrix still causes a difficulty in 
one's linear algorithms for some other reason, or perhaps the original ill-conditioned 
matrix did not cause a difficulty in the first place; we thank Alan Edelman for 
pointing out this subtlety. Nevertheless, Conjecture |4.4| does provide an informal 
intuitive justification of smoothed analysis, and various rigorous versions of this 
conjecture were used in the formal arguments in |44]: see Section 1.4 of that paper 
for further discussion. 

^'^In particular, it is naturally associated to the concept oi polynomially smoothed 
analysis from |44j . 
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following bound, which played a critical role in their smooth analysis 

Theorem 4.6. Let A be an arbitrary n by n matrix and Mn be a 
random matrix with iid Gaussian entries. Then for any x > 0, 

P(||(A + M„)-1 >x)=0(^). 



While Spielman-Teng smooth analysis does seem to have the right phi- 
losophy, the choice of M„ is a bit artificial. Of course, the analysis 
still passes if one replaces Gaussian by a fine enough approximation. 
A large fraction of problems in linear programming deal with integral 
matrices, so the noise is perturbation by integers. In other cases, even 
when the noise has continuous support, the data is strongly truncated. 
For example, in many engineering problems, one does not keep more 
than, say, three to five decimal places. Thus, in many situations, the 
entries of M„ end up having discrete support with relatively small size, 
which may not even grow with n, while the approximation mentioned 
above would require this support to have size exponential in n. There- 
fore, in order to come up with an analysis that better captures real life 



data, one needs to come up with a variant of Theorem 4.6 where the 
entries of M„ have discrete support. 

This problem was suggested to the authors by Spielman a few years ago. 
Using the Weak Inverse Theorem, we were able to prove the following 



variant of Theorem 4.6 EO 



Theorem 4.7. For any constants a,c > 0, there is a constant b = 
b{a, c) > such that the following holds. Let A be an n by n matrix 
such that \\A\\ < and let Mn be a random matrix with iid Bernoulli 
entries. Then 



Using the stronger /3-net Theorem, one can have a nearly optimal rela- 
tion between the constants a, b and c ^j. These results extend, with 
the same proof, to a large variety of distributions. For example, one 
does not need require the entries of M„ to be iicp] although indepen- 
dence is crucially exploited in the proofs. Also, one can allow many of 
the entries to be [50] . 

Remark 4.8. Results of this type first appear in [37] (see also [33] for 
some earlier related work for the least singualar value of rectangular 
matrices). In the special case where A = and where the entries of 
Mn are iid and have finite fourth moment, Rudelson and Vershynin 



In practice, one would expect the noise at a large entry to have larger variance 
than one at a small entry, due to multiplicative effects. 
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[38] (see also [39], [IQ]) obtained sharp bounds for ||(yl + M„) ^||, using 
a somewhat different method, which rehes on an inverse theorem of a 



shghtly different nature; see Remark 3.12 



The main idea behind the proof of Theorem 4.7, which first appears in 
[37] . is the following. Let di be the distance from the i^^ row vector of 
A + Mn to the subspace spanned by the rest of the rows. Elementary 



linear algebra (see also (10) below) then gives the bound 
||(A + M„)-1 =n«('Hmin di)-\ 

l<i<n 

Ignoring various factors of n'^^^\ the main task is then to understand 
the distribution of di for any given i. 

If V = (f 1, . . . , Vn) is the normal vector of a hyperplane V, then the 
distance from a random vector (ai + ^i, . . . , a„ + C,n) to the hyperplane 
V is given by the formula 

|^^l(^l + ai) H h VniCn + an)| = I ^ttiVi + S\ 

i 

where S := J2^=i'^i^i as in the previous section. 

To estimate the chance that | ^"^^ aiVi + S\ < P, the notion of the 
small ball probability Pv{P) comes naturally. Of course, this quan- 
tity depends on the normal vector v, and so we now divide into cases 
depending on the nature of this vector. 



If Pv{P) small, we can be done using a conditioning argumenlj^ On 
the other hand, the /3-net Theorem says that there are "few" v such 
that Pv(0) is large, and in this direct counting argument finishes 

the jotpl Details can be found in [50], [M], or [5T] . 



^^Intuitively, the idea of this conditioning argument is to first fix (or "condition" ) 
on n — 1 of the rows oi A + M„, which should then fix the normal vector v. 
The remaining row is independent of the other n — 1 rows, and so should have a 
probability at most Pv{/3) of lying within (3 of the span of the those rows. There are 
some minor technical issues in making this argument (which essentially dates back 
to [29]) rigorous, arising from the fact that the n — 1 rows may be too degenerate 
to accurately control v, but these difficulties can be dealt with, especially if one is 
willing to lose factors of n'-'^^^ in various places. 

"'^'^For instance, one important class of v for which Pv(/3) tends to be large are 
the compressible vectors v, in which most of the entries are close to zero. Each 
compressible v (e.g. v = (1, — 1, 0, . . . , 0)) has a moderately large probability of 
being close to a normal vector for A + Af„ (e.g. in the random Bernoulli case, 

V = (1, — 1, 0, . . . , 0) has a probability about 2~" of being a normal vector); but the 
number (or more precisely, the metric entropy) of the set of compressible vectors is 
small (of size 2°^"')) and so the net contribution of these vectors is then manageable. 
Similar arguments (relying heavily on the /3-net theorem) handle other cases when 

V is large (e.g. if most entries of v live near a GAP of controlled size). 
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5.1. The replacement principle. Let us now take another look at 
the Circular Law Conjecture. Recall that Ai, . . . , A„ are the eigenvalues 
of An = -^Mn, which generates a normalized counting measure fiAn- 
We want to show that fiAn tends (in probability) to the uniform measure 
fj, on the unit disk. 

The traditional way to attack this conjecture is via a Stieltjes transform 
techniqu^^ following [HI [3] . Given a (complex) measure u, define, for 
any z with Im z > 0, 

Su{z) := I di'{x). 



X — z 



For the ESD p,A„, we have 



-E— • 

n ^-^ A ,; — z 



Thanks to standard results from probabilitjj^ in order to establish the 
Circular Law Conjecture in the strong (resp. weak) sense, it suffices 
to show that s^„(z) converges almost surely (resp. in probability) to 
s^i^z) for almost all z (see [55] for a precise statement). 

Set z =: s + it and s„(z) =: 5 + %T . Since s„ is analytic except 
at the poles, and vanishes at infinity, the Stieltjes transform s„(z) is 
determined by its the real part 5*. Let us take a closer look at this 
variable: 



S 



1 Re(Aj) — i. 
n ^ |A,: — zV 



d_ 

2n ^ ds 
1 d 



2ds 



logx dr]n 



^^The more classical moment method, which is highly successful in the Hermitian 
setting (for instance in proving Theorem [r2|), is not particularly effective in the 
non-Hermitian setting, because moments such as traced™ for m = 0, 1, 2, ... do not 
determine the ESD /i^^ (even approximately) unless one takes m to be as large as 
n; see |3], |4] for further discussion. 

^^One can also use the theory of logarithmic potentials for this, as is done for 
instance in |2T], [55] . 
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where 

rin := /i(_i^M„-./)(-i^M„-./)* 

is the normahsed counting measure of the (squares of the) singular 
values of -^M^ — zl. Notice that in the third equahty, we use the fact 

that n l-^i ~ ^1 = I ^^^i^^n — zl)\. This step is critical as it reduces 
the study of a complex measure to a real one, or in other words to study 
the ESD of a Hermitian matrix rather than a non-Hermitian matrix. 



Putting this observation in the more general setting of Theorem |2.2 
we arrived at the following useful result. 

Theorem 5.2 (Replacement principle). |55] Suppose for each n that 
An,Bn G M„(C) are ensembles of random matrices. Assume that 



(i) The expression 



(9) 



is weakly (resp. strongly) bounde(^ 
(ii) For almost all complex numbers z, 

- log I det(^y4„ - 2:/)| - - log I det(-^5„ - zl) 
n \/n n Jn 



converges weakly {resp. strongly) to zero . In particular, for each 
fixed z, these determinants are non-zero with probability 1 — o(l) 
for all n (resp. almost surely non-zero for all but finitely many 
n). 

Then ^Jij_j^ — f^^B converges weakly (resp. strongly) to zero. 



At a technical level, this theorem reduces Theorem 2^ to the compar- 
ison of log I det{^An - zl)\ and log | det(^5„ - zl)\. 

Remark 5.3. Note that this expression is large and unstable when z 
lies in the pseudo spectra of either -^A^ or -^Bn, which means that 

the resolvent {^An — zl)~^ or {^B^ — zl)~^ is large. Controlling the 
probability of the event that z lies in the pseudospectrum is therefore 
an important portion of the analysis. This technical problem is not 
an artefact of the method, but is in fact essential to any attempt to 
control non-Hermitian ESDs for general random matrix models, as such 
ESDs are extremely sensitive to perturbations in the matrix in regions 
of pseudospectrum. See |3], |1] for further discussion. 

16a 

sequence Xn of non-negative random variables is said to be weakly bounded 
if linic^oo liminfn^oo P{xn < C) = 1, and strongly bounded if limsup„^f^ x„ < oo 
with probability 1. 
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5.4. Treatment of the pole. Using techniques from probability, such 
as the moment method, one can show that the distributions of the 
singular values of -^^n ~ zl and -^Bn — zl are asymptotically the 

samq^ [21 EH [IHl ESI IH] • This, however, is not sufficient to conclude 
that ^log I det{^An — zl)\ and ^ log | det(;^-B„ — zl)\ are close. As 
remarked earlier, the main difficulty here is that some of the singular 
values can be very small and thus significantly infiuence the value of 
logarithm. 



Now is where Theorem 4.7 enters the picture. This theorem tells us 
that (with overwhelming probability), there is no mass between and 
(say) , for some sufficiently large constant C. Using this critical 
information, with some more worlj^ we obtain: 

Theorem 5.5. [5l] The Circular Law holds (with both strong and weak 
convergence) under the extra condition that the entries have bounded 
(2 + rjY^ moment, for some constant rj > 0. 

Remark 5.6. Shortly after the appearance of [M], Gotze and Tikhomirov 
j22j gave an alternate proof of the weak circular law with these hypoth- 



esis, using a variant of Theorem 4.7, which they obtained via a method 



from [37] , [38j • This method is based on a different version of the Weak 
Inverse Theorem. 



5.7. Negative second moment and sharp concentration. At the 

point it was written, the analysis in [5l] looked close to the limit of 
the method. It took some time to realize where the extra moment 
condition came from and even more time to figure out a way to avoid 
that extra condition. Consider the sums 



1 1 1 " 

- log \det{^ An - zl)\ = - Vio, 



n 



i=l 



^^In the setting where the matrices X„ and Yn have iid entries, one can use the 
results of [TU] to establish this. In the non-iid case, an invariance principle from [TT] 
gives a slightly weaker version of this equivalence; this was observed by Manjunath 
Krishnapur and appears as an appendix to [55] . 

^^In p artic ular, the presence of certain factors of logn arising from inserting 
into the normalized log-dctcrminant Mog|det(^A„ — zl)\ forces 



4.7 



Theorem 

one to establish a convergence rate for the ESD of -^^n — zl which is faster than 
logarithmic in n in a certain sense. This is what ultimately forces one to assume the 
bounded (2 + 77)*^ moment hypothesis. Actually the method allows one to relax this 
hypothesis to that of assuming E|x|2log'^(2+ |x|) < 00 for some absolute constant 
C (e.g. C= 16 will do). 
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where o"i > ■ ■ ■ > cr„ are the singular values of -^An — zl , and 

1 1 1 " 

-log|det(^5„ - zl)\ = - ^loga'i, 

* 1=1 
where c'l > ■ ■ ■ > cy'n ^'^^ singular values of -^Bn — zl. 

As already mentioned, we know that the bulk of the cTj and a- are 
distributed similarly. For the smallest few, we used the lower bound 
on (Tn as a uniform bound be show that their contribution is negligible. 
This turned out to be wasteful, and we needed to use the extra moment 
assumption to compensate the loss in this step. 

In order to remove this assumption, we need to find a way to give a 
better bound on other singular values. An important first step is the 
discovery of the following simple, but useful, identity. 

The Negative Second Moment Identity. [55] Let A he an m x n 

matrix, m < n. Then 

m m 
i=l i=l 

where, as usual, di are the distances and (Tj are the singular values. 

One can prove this identity using undergraduate linear algebra. With 
this in hand, the rest of the proof falls into plac^^ Consider the 
singular values o"i > • ■ ■ > (Xn involved in our analysis, and use A as 
shorthand for -^An — zl. To bound cr„_fc from below, notice that by 
the interlacing law 

Crn-k{A) > (T„,_fc(v4') 

where m := n — k and A' is an m x n truncation of A, obtained by 
omitting the last k rows. The Negative Second Moment Identity implies 



m m 
i=l i=l 



^^A possible alternate approach would be to bound the intermediate singular 
values directly, by adapting the results from |39j . This would however require some 
additional effort; for instance, the results in |39j assume zero mean and bounded 
operator norm, which is not true in general when considering -^An — zl for non- 
zero z assuming only a mean and variance condition on the entries of A^. In any 
case, the analysis in [33] ultimately goes through a computation of the distances 
di, similarly to the approach we present here based on the negative second moment 
identity. 
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On the other hand, the right-hand side can be bounded efficiently, 
thanks to the fact that all di are large with overwhelming probability, 
which, in turn, is a consequence of Talagrand's inequality 



Lemma 5.8 (Distance Lemma). [221 ES] With probability 1 — n^^^^\ 
the distance from a random row vector to a subspace of co-dimension 
k is at least j^^yk/n, as long as k^ logn. 

Thus, with overwhelming probability, Xllii '^7^ VL{m/nk) = ^l((n — 
k)/nk), which implies 



k 



n 



This lower bound now is sufficient to establish Theorem 12.21 and with 
it the Circular Law in full generality. 



6. Open problems 
Our investigation leads to open problems in several areas: 

Combinatorics. Our studies of Littewood-Offord problem focus on the 
linear form S := XliLi^j^^*- What can one say about higher degree 
polynomials ? 

In [6], it was shown that for a quadratic form Q := J2i<ij<n '^ij^i^j "with 
non-zero coefficients, P{Q = z) is 0{n~^^^). It is simple to improve 
this bound to 0{n~^/^) [7]. On the other hand, we conjecture that 
the truth is 0(n~^/^), which would be sharp by taking Q = (Ci + 
■ ■ ■ + Cn)^- Costello (personal communication) recently improved the 
bound to 0(n^^/^), and it looks likely that his approach will lead to 
the optimal bound, or something close. 

The situation with higher degrees is much less clear. In |6j , a bound of 
the form 0{n^'^^) was shown, where Ck is a positive constant depending 
on k, the degree of the polynomial involved. In this bound Ck decreases 
very fast with k. 

Smooth analysis. Spielman-Teng smooth analysis of the simplex algo- 
rithm [H] was done with gaussian noise. It is a very interesting problem 
to see if one can achieve the same conclusion with discrete noise with 
fixed support, such as Bernoulli. It would give an even more convincing 
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explanation to the efficiency of the simplex method. As discussed ear- 
lier, noise that occurs in practice typically has discrete, small support. 
(This question was mentioned to us by several researchers, including 
Spielman, few years ago.) 



As discussed earlier, we now have the discrete version of Theorem 4.6 



While Theorem 4. 6 [ plays a very important part in Spielman- Teng anal- 



ysis [IS], there are several other parts of the proof that make use of the 
continuity of the support in subtle ways. It is possible to modify these 
parts to work for fine enough discrete approximations of the continuous 
(noise) variables in question. However, to do so it seems one need to 
make the size of the support very large (typically exponential in n, the 
size of the matrix). 

Another exciting direction is to consider even more realistic models of 
noise. For instance, 



• In several problems, the matrix may have many frozen entries, 
namely those which are not effected by noise. In particular, an 
entry which is zero (by nature of the problem) is likely to stay 
zero in the whole computation. It is clear that the pattern of 
the frozen entries will be of importance. For example, if the 
ffist column consists of (frozen) zero, then no matter how the 
noise effects the rest of the matrix, it will always be non-singular 
(and of course ill-conditioned). We hope to classify all patterns 
where theorems such as Theorem 12.21 are still valid. 

• In non-frozen places, the noise could have different distribu- 
tions. It is natural to think that the error at a large entry 
should have larger variance than the one occurring at a smaller 
entry. 



Some preliminary results in these directions are obtained in JEUl- How- 
ever, we are still at the very beginning of the road and much needs to 
be done. 



Circular Law. A natural question here is to investigate the rate of 
convergence. In [31], we observed that under the extra assumption 
that the (2 -|- £)-moment of the entries are bounded, we can have rate 
of convergence of order n~^, for some positive constant 6 depending on 
e. The exact dependence between e and 6 is not clear. 

Another question concerns the determinant of random matrices. It is 
known, and not hard to prove, that log | det Mn\ satisfies a central limit 
theorem, when the entries of M„ are iid gaussian, see [20l |8]. Girko 
[20] claimed that the same result holds for much more general models 
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of matrices. We, however, are unable to verify his arguments. It would 
be nice to have an alternative proof. 
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